Performance Tuning of Small Scale Shared Memory Multiprocessor Applications using Visualisation

نویسنده

  • Mats Brorsson
چکیده

Even though shared memory multiprocessors are becoming more and more common, it is still a formidable task to achieve high performance on parallel applications. One of the main reasons for this is a high amount of implicit communication generated by the program due to poor structuring of the program. This article shows the importance of performance visualisation in order to spot and find the source of cache coherence bottlenecks. This is exemplified by a performance analysis tool, SMprof, that visualises accesses to shared data structures so that problematic access patterns are highlighted. SM-prof maintains links from the visualisation to the actual source code lines responsible for the accesses. In contrast to earlier approaches, SM-prof shows the inherent data sharing of the application that would occur in any shared memory architecture. We demonstrate the merits of SM-prof by means of two detailed case studies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Visualisation for Performance Tuning of DVSM Applications

Small organisations can now have access to high raw processing power using networks of workstations (NOW) as parallel computing platforms. Distributed Virtual Shared Memory (DVSM) packages have been developed to facilitate the programming of such systems. However, because of the high interprocess latencies in a NOW, the performance of a DVSM application is more susceptible to the partitioning o...

متن کامل

A Scaleable Multiprocessor Architecture with Multiple Read-Write Memory Model

This paper presents a scalable multiprocessor architecture with multiple access memories and multi-way busses. This parallel architecture with more intelligent memory model and efficient multi-way interconnection network organization is called as CRrCW (Concurrent Read and restricted Concurrent Write) scaleable multiprocessor system. The memory and network model provides concurrent memory acces...

متن کامل

Multiprocessor Memory Hierarchies

parallel computer architecture; high performance system design; system bus; caches; memory hierarchies; shared memory machines Memory latency, bandwidth, and locality of reference will play larger roles in future parallel systems as processors speed up relative to main memory latency. Using an instruction level PA-RISe multiprocessor simulator, we examined hardware and software techniques that ...

متن کامل

Brazos: A Third Generation DSM System

Brazos is a third generation distributed shared memory (DSM) system designed for x86 machines running Microsoft Windows NT 4.0. Brazos is unique among existing systems in its use of selective multicast, a software-only implementation of scope consistency, and several adaptive runtime performance tuning mechanisms. The Brazos runtime system is multithreaded, allowing the overlap of computation w...

متن کامل

Self-Tuned Distributed Multiprocessor System

Self-tuning, a technique devised by Ted Kehl, is a new clocking paradigm which incorporates the best features of conventional synchronous logic along with advantages offered by the asynchronous, self-timed paradigm. The principles of self-tuning are explained and an outline of the application of self-tuning to the interconnection network of a large scale distributed shared memory multiprocessor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997